Rank in Wordlist | Word | Rank in Wordlist | Word |
---|---|---|---|
1 | nga | 26 | Aves |
2 | han | 27 | Gastropoda |
3 | in | 28 | Porifera |
4 | uska | 29 | Lepidoptera |
5 | species | 30 | Stefan |
6 | ni | 31 | Breuning |
7 | ginhulagway | 32 | William |
8 | hadton | 33 | Bryozoa |
9 | An | 34 | Anthozoa |
10 | syahan | 35 | Anura |
11 | ngan | 36 | an |
12 | Insecta | 37 | mga |
13 | Coleoptera | 38 | tuig |
14 | Araneae | 39 | Carl |
15 | Diptera | 40 | Maxillopoda |
16 | Actinopterygii | 41 | John |
17 | Orthoptera | 42 | Reptilia |
18 | Malacostraca | 43 | J. |
19 | Hymenoptera | 44 | kalendaryo |
20 | Annelida | 45 | Simon |
21 | Arachnida | 46 | de |
22 | von | 47 | 1983 |
23 | Diplopoda | 48 | hin |
24 | ha | 49 | M. |
25 | Sarcoptiformes | 50 | Walker |
The table shows the top-50 words of the corpus. Usually we see stopwords.
Language: Afrikaans
This list is a good candidate for a first stopword list for a language.
Usually a small, balanced corpus is enough to get a good list of high frequent words. But if the small corpus has some very prominent topic, this will be visible even in the top word lists.
select w_id-100 as rank_in_wordlist, word from words where w_id>100 order by w_id limit 50;
3.4 Sample words for different frequency ranges